Overview:

This page contains the results of CoNGA analyses. Results in tables may have been filtered to reduce redundancy, focus on the most important columns, and limit length; full tables should exist as OUTFILE_PREFIX*.tsv files.

Command:

/Users/Nick/conga/scripts/run_conga.py --all --gex_data merged_COVID_gex.h5ad --gex_data_type h5ad --clones_file merged_COVID_clones.tsv --organism human --graph_vs_graph --outfile_prefix ./CoNGA.output --no_kpca

Stats

num_cells_w_gex: 11282
num_features_start: 36601
num_cells_w_tcr: 10155
min_genes_per_cell: 200
max_genes_per_cell: 2500
max_percent_mito: 0.1
num_filt_max_genes_per_cell: 3463
num_filt_max_percent_mito: 114
num_TR_genes: 151
num_TR_genes_in_hvg_set: 92
num_highly_variable_genes: 1396
num_cells_after_filtering: 6578
num_clonotypes: 5453
max_clonotype_size: 135
num_singleton_clonotypes: 4864
nbr_frac_for_nndists: 0.01
num_gvg_hit_clonotypes: 80
num_gvg_hit_biclusters: 7

graph_vs_graph_stats


Here we are assessing overall graph-vs-graph correlation by looking at the shared edges between TCR and GEX neighbor graphs and comparing that observed number to the number we would expect if the graphs were completely uncorrelated. Our null model for uncorrelated graphs is to take the vertices of one graph and randomly renumber them (permute their labels). We compare the observed overlap to that expected under this null model by computing a Z-score, either by permuting one of the graph's vertices many times to get a mean and standard deviation of the overlap distribution, or, for large graphs where this is time consuming, by using a regression model for the standard deviation. The different rows of this table correspond to the different graph-graph comparisons that we make in the conga graph-vs-graph analysis: we compare K-nearest-neighbor graphs for GEX and TCR at different K values ("nbr_frac" aka neighbor-fraction, which reports K as a fraction of the total number of clonotypes) to each other and to GEX and TCR "cluster" graphs in which each clonotype is connected to all the other clonotypes with the same (GEX or TCR) cluster assignment. For two K values (the default), this gives 2*3=6 comparisons: GEX KNN graph vs TCR KNN graph, GEX cluster graph vs TCR KNN graph, and GEX KNN graph vs TCR cluster graph, for each of the two K values (aka nbr_fracs).

The column to look at is *overlap_zscore*. Higher values indicate more significant GEX/TCR covariation, with "interesting" levels starting around zscores of 3-5.

Columns in more detail:

graph_overlap_type: KNN ("nbr") or cluster versus KNN ("nbr") or cluster

nbr_frac: the K value for the KNN graph, as a fraction of total clonotypes

overlap: the observed overlap (number of shared edges) between GEX and TCR graphs

expected_overlap: the expected overlap under a shuffled null model.

overlap_zscore: a Z-score for the observed overlap computed by subtracting the expected overlap and dividing by the standard deviation estimated from shuffling.
overlap expected_overlap overlap_mean overlap_sdev overlap_zscore overlap_zscore_fitted overlap_zscore_source nodes calculation_time calculation_time_fitted gex_edges tcr_edges gex_indegree_variance gex_indegree_skewness gex_indegree_kurtosis tcr_indegree_variance tcr_indegree_skewness tcr_indegree_kurtosis indegree_correlation_R indegree_correlation_P nbr_frac graph_overlap_type
3185 2916.534850 2904.35 59.545508 4.713202 2.830389 shuffling 5453 1.756877 0.222982 294462 294462 1.353783 2.841801 12.638694 0.361744 1.075073 1.956458 -0.016566 0.221274 0.01 gex_nbr_vs_tcr_nbr
27011 25967.918562 26007.54 250.674547 4.003039 2.758392 shuffling 5453 16.985184 2.187768 294462 2621798 1.353783 2.841801 12.638694 0.217628 0.257416 -0.830100 0.023136 0.087583 0.01 gex_nbr_vs_tcr_cluster
34996 33523.619956 33521.38 254.743470 5.788647 4.499293 shuffling 5453 22.326489 2.856535 3384644 294462 0.229432 -0.096729 -1.066463 0.361744 1.075073 1.956458 -0.013610 0.314988 0.01 gex_cluster_vs_tcr_nbr
301911 297079.480007 297100.53 1873.302712 2.567909 2.501584 shuffling 5453 22.269924 22.887008 2971885 2971885 0.858292 1.383529 2.045873 0.276955 0.945580 1.692342 -0.010150 0.453628 0.10 gex_nbr_vs_tcr_nbr
270317 262083.622524 261916.60 1636.190490 5.134121 4.941807 shuffling 5453 19.614371 20.078878 2971885 2621798 0.858292 1.383529 2.045873 0.217628 0.257416 -0.830100 0.025899 0.055823 0.10 gex_nbr_vs_tcr_cluster
345102 338340.238445 338382.72 1230.676286 5.459827 4.470820 shuffling 5453 24.767504 26.216685 3384644 2971885 0.229432 -0.096729 -1.066463 0.276955 0.945580 1.692342 0.022744 0.093077 0.10 gex_cluster_vs_tcr_nbr

graph_vs_graph


Graph vs graph analysis looks for correlation between GEX and TCR space by finding statistically significant overlap between two similarity graphs, one defined by GEX similarity and one by TCR sequence similarity.

Overlap is defined one node (clonotype) at a time by looking for overlap between that node's neighbors in the GEX graph and its neighbors in the TCR graph. The null model is that the two neighbor sets are chosen independently at random.

CoNGA looks at two kinds of graphs: K nearest neighbor (KNN) graphs, where K = neighborhood size is specified as a fraction of the number of clonotypes (defaults for K are 0.01 and 0.1), and cluster graphs, where each clonotype is connected to all the other clonotypes in the same (GEX or TCR) cluster. Overlaps are computed 3 ways (GEX KNN vs TCR KNN, GEX KNN vs TCR cluster, and GEX cluster vs TCR KNN), for each of the K values (called nbr_fracs short for neighbor fractions).

Columns (depend slightly on whether hit is KNN v KNN or KNN v cluster): conga_score = P value for GEX/TCR overlap * number of clonotypes mait_fraction = fraction of the overlap made up of 'invariant' T cells num_neighbors* = size of neighborhood (K) cluster_size = size of cluster (for KNN v cluster graph overlaps) clone_index = 0-index of clonotype in adata object


conga_score num_neighbors_gex num_neighbors_tcr overlap overlap_corrected mait_fraction clone_index nbr_frac graph_overlap_type cluster_size gex_cluster tcr_cluster va ja cdr3a vb jb cdr3b
0.001559 NaN 54.0 14 12 0.000000 3634 0.01 gex_cluster_vs_tcr_nbr 205.0 9 5 TRAV35*01 TRAJ53*01 CAGRLSGGSNYKLTF TRBV11-2*01 TRBJ1-2*01 CASSLTGNYGYTF
0.003253 54.0 54.0 8 7 0.000000 3528 0.01 gex_nbr_vs_tcr_nbr NaN 2 5 TRAV35*01 TRAJ42*01 CAAVNYGGSQGNLIF TRBV5-1*01 TRBJ2-2*01 CASSPRTGGSTGELFF
0.007744 545.0 NaN 108 108 0.000000 4928 0.10 gex_nbr_vs_tcr_cluster 708.0 0 1 TRAV8-4*01 TRAJ54*01 CAVSDRQGAQKLVF TRBV4-1*01 TRBJ2-1*01 CASRSGWANEQFF
0.015643 545.0 545.0 87 87 0.011494 4966 0.10 gex_nbr_vs_tcr_nbr NaN 2 12 TRAV8-6*01 TRAJ13*01 CAVITSGGYQKVTF TRBV20-1*01 TRBJ1-6*01 CSARDRTESSYNSPLHF
0.023260 NaN 54.0 13 12 0.000000 3556 0.01 gex_cluster_vs_tcr_nbr 257.0 6 5 TRAV35*01 TRAJ42*01 CAGMNYGGSQGNLIF TRBV11-2*01 TRBJ1-2*01 CASSQREGTLYGYTF
0.029243 545.0 545.0 86 86 0.011628 2403 0.10 gex_nbr_vs_tcr_nbr NaN 5 2 TRAV24*01 TRAJ44*01 CAPGTASKLTF TRBV20-1*01 TRBJ1-1*01 CSAREQRDTMNTEAFF
0.051957 NaN 545.0 98 98 0.010204 4361 0.10 gex_cluster_vs_tcr_nbr 652.0 2 12 TRAV8-1*01 TRAJ12*01 CAVTPGADSSYKLIF TRBV20-1*01 TRBJ2-3*01 CSALGVAGMGDGTQYF
0.052461 NaN 54.0 17 16 0.000000 3463 0.01 gex_cluster_vs_tcr_nbr 493.0 5 5 TRAV35*01 TRAJ17*01 CAGQLYKAAGNKLTF TRBV19*01 TRBJ2-3*01 CASSQGGLGVHF
0.054839 NaN 54.0 14 10 0.000000 3532 0.01 gex_cluster_vs_tcr_nbr 204.0 9 5 TRAV35*01 TRAJ42*01 CAGKNYGGSQGNLIF TRBV7-3*01 TRBJ2-3*01 CASSLRGDTQYF
0.055734 54.0 54.0 7 6 0.000000 3533 0.01 gex_nbr_vs_tcr_nbr NaN 9 5 TRAV35*01 TRAJ42*01 CAGKNYGGSQGNLIF TRBV7-3*01 TRBJ1-2*01 CASSPGPGSPYGYTF
0.057226 NaN 54.0 14 10 0.000000 3623 0.01 gex_cluster_vs_tcr_nbr 205.0 9 5 TRAV35*01 TRAJ53*01 CAGLNSGGSNYKLTF TRBV6-4*01 TRBJ1-2*01 CASSARSGPLAGYTF
0.062457 545.0 NaN 79 73 0.000000 2856 0.10 gex_nbr_vs_tcr_cluster 461.0 5 5 TRAV27*01 TRAJ17*01 CAGAKAAGNKLTF TRBV7-2*01 TRBJ1-6*01 CASSLRTGGDNSPLHF
0.064224 545.0 NaN 105 104 0.000000 4586 0.10 gex_nbr_vs_tcr_cluster 708.0 0 1 TRAV8-3*01 TRAJ23*01 CVIINQGGKLIF TRBV24-1*01 TRBJ1-2*01 CATSKDRVYGYTF
0.066483 NaN 54.0 13 10 0.000000 3538 0.01 gex_cluster_vs_tcr_nbr 203.0 9 5 TRAV35*01 TRAJ42*01 CAGLNYGGSQGNLIF TRBV11-2*01 TRBJ1-2*01 CASSSRANGLNGYTF
0.069488 54.0 54.0 6 6 0.000000 3623 0.01 gex_nbr_vs_tcr_nbr NaN 9 5 TRAV35*01 TRAJ53*01 CAGLNSGGSNYKLTF TRBV6-4*01 TRBJ1-2*01 CASSARSGPLAGYTF
0.076880 NaN 54.0 12 10 0.000000 3574 0.01 gex_cluster_vs_tcr_nbr 201.0 9 5 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV6-2*01 TRBJ1-5*01 CASSYSQGQPQHF
0.078514 NaN 545.0 98 97 0.010204 2480 0.10 gex_cluster_vs_tcr_nbr 652.0 2 2 TRAV25*01 TRAJ38*01 CAGDNAGNNRKLIW TRBV20-1*01 TRBJ1-5*01 CSALNQGQYSNQPQHF
0.096380 545.0 NaN 28 28 0.000000 4963 0.10 gex_nbr_vs_tcr_cluster 122.0 2 12 TRAV8-6*01 TRAJ11*01 CAVSLGPSGYSTLTF TRBV20-1*01 TRBJ2-3*01 CSAIDRGQGDTQYF
0.138730 NaN 54.0 12 11 0.000000 3571 0.01 gex_cluster_vs_tcr_nbr 256.0 6 5 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV5-5*01 TRBJ2-1*01 CASSPRLAGSSYNEQFF
0.138730 NaN 54.0 12 11 0.000000 3577 0.01 gex_cluster_vs_tcr_nbr 256.0 6 5 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV7-8*01 TRBJ1-2*01 CASSPRQGAINGYTF
0.171267 NaN 54.0 11 11 0.000000 3547 0.01 gex_cluster_vs_tcr_nbr 256.0 6 5 TRAV35*01 TRAJ42*01 CAGLNYGGSQGNLIF TRBV6-1*01 TRBJ1-2*01 CASSGRQGALYGYTF
0.174073 545.0 545.0 83 83 0.000000 4563 0.10 gex_nbr_vs_tcr_nbr NaN 0 1 TRAV8-3*01 TRAJ15*01 CAVGGNQAGTALIF TRBV4-2*01 TRBJ1-1*01 CASSQKGARGTEAFF
0.183396 545.0 NaN 76 75 0.000000 1102 0.10 gex_nbr_vs_tcr_cluster 482.0 2 2 TRAV13-2*01 TRAJ8*01 CAENTGFQKLVF TRBV20-1*01 TRBJ1-5*01 CSARIGQDQPQHF
0.185694 545.0 NaN 103 102 0.000000 4944 0.10 gex_nbr_vs_tcr_cluster 708.0 0 1 TRAV8-4*01 TRAJ8*01 CAVSDRLGTGFQKLVF TRBV25-1*01 TRBJ1-5*01 CASSDGVSQPQHF
0.213459 545.0 NaN 102 102 0.000000 5021 0.10 gex_nbr_vs_tcr_cluster 708.0 4 1 TRAV8-6*01 TRAJ32*01 CAVTPMGGATNKLIF TRBV5-5*01 TRBJ1-5*01 CASSPRDSRNQPQHF
0.213459 545.0 NaN 102 102 0.000000 5116 0.10 gex_nbr_vs_tcr_cluster 708.0 10 1 TRAV8-6*01 TRAJ9*01 CAVSGGTGGFKTIF TRBV19*01 TRBJ2-7*01 CASRPTSGSLDEQYF
0.213459 545.0 NaN 102 102 0.000000 5032 0.10 gex_nbr_vs_tcr_cluster 708.0 4 1 TRAV8-6*01 TRAJ37*01 CAVLGTGSSNTGKLIF TRBV6-2*01 TRBJ2-7*01 CASRQTLLGEQYF
0.213459 545.0 NaN 102 102 0.000000 4868 0.10 gex_nbr_vs_tcr_cluster 708.0 4 1 TRAV8-4*01 TRAJ43*01 CAVSAYNNNDMRF TRBV6-2*01 TRBJ2-7*01 CASNGGGAGEDEQYF
0.231893 NaN 545.0 96 95 0.000000 2869 0.10 gex_cluster_vs_tcr_nbr 652.0 2 2 TRAV27*01 TRAJ24*01 CAGARTTDSWGKLQF TRBV20-1*01 TRBJ2-2*01 CSASTSGNTGELFF
0.245305 NaN 545.0 86 86 0.000000 1325 0.10 gex_cluster_vs_tcr_nbr 575.0 3 14 TRAV16*01 TRAJ52*01 CALSGRGGGGTSYGKLTF TRBV18*01 TRBJ2-7*01 CASSPPGTEVQYF
0.260769 NaN 545.0 98 94 0.000000 3464 0.10 gex_cluster_vs_tcr_nbr 652.0 2 5 TRAV35*01 TRAJ17*01 CAGQLYRAAGNKLTF TRBV19*01 TRBJ1-2*01 CASSPAPGQGSIYGYTF
0.267568 545.0 NaN 75 71 0.000000 3633 0.10 gex_nbr_vs_tcr_cluster 460.0 6 5 TRAV35*01 TRAJ53*01 CAGRLSGGSNYKLTF TRBV11-2*01 TRBJ1-2*01 CASSLTGNYGYTF
0.270479 545.0 NaN 27 27 0.000000 4453 0.10 gex_nbr_vs_tcr_cluster 122.0 5 12 TRAV8-1*01 TRAJ50*01 CAVNGKTSYDKVIF TRBV20-1*01 TRBJ1-2*01 CSAPIGRGNYGYTF
0.292462 545.0 NaN 118 118 0.000000 4224 0.10 gex_nbr_vs_tcr_cluster 852.0 8 0 TRAV5*01 TRAJ39*01 CAESIHAGNMLTF TRBV11-2*01 TRBJ2-3*01 CASSLERNAAGADTQYF
0.292820 NaN 54.0 14 9 0.000000 3542 0.01 gex_cluster_vs_tcr_nbr 203.0 9 5 TRAV35*01 TRAJ42*01 CAGLNYGGSQGNLIF TRBV6-1*01 TRBJ1-2*01 CASITKDRGFGYTF
0.314597 NaN 54.0 14 9 0.000000 3593 0.01 gex_cluster_vs_tcr_nbr 205.0 9 5 TRAV35*01 TRAJ42*01 CAVMNYGGSQGNLIF TRBV5-1*01 TRBJ1-2*01 CASSAGRGDGYTF
0.314597 NaN 54.0 14 9 0.000000 3533 0.01 gex_cluster_vs_tcr_nbr 205.0 9 5 TRAV35*01 TRAJ42*01 CAGKNYGGSQGNLIF TRBV7-3*01 TRBJ1-2*01 CASSPGPGSPYGYTF
0.314597 NaN 54.0 14 9 0.000000 3587 0.01 gex_cluster_vs_tcr_nbr 205.0 9 5 TRAV35*01 TRAJ42*01 CAGRNYGGSQGNLIF TRBV7-3*01 TRBJ2-3*01 CASSPRHGTDTQYF
0.347869 545.0 NaN 77 70 0.000000 3390 0.10 gex_nbr_vs_tcr_cluster 461.0 9 5 TRAV30*01 TRAJ33*01 CGTALSNYQLIW TRBV9*01 TRBJ2-1*01 CASSLLDLRYNEQFF
0.386918 NaN 54.0 13 9 0.000000 3550 0.01 gex_cluster_vs_tcr_nbr 205.0 9 5 TRAV35*01 TRAJ42*01 CAGLNYGGSQGNLIF TRBV14*01 TRBJ2-5*01 CASSKRQHSPAETQYF
0.410000 NaN 54.0 12 9 0.000000 3572 0.01 gex_cluster_vs_tcr_nbr 201.0 9 5 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV6-1*01 TRBJ1-2*01 CASFRGGVNGYTF
0.452737 545.0 NaN 75 70 0.000000 2855 0.10 gex_nbr_vs_tcr_cluster 461.0 5 5 TRAV27*01 TRAJ16*01 CAGRFSDGQKLLF TRBV5-1*01 TRBJ2-3*01 CASSPPGGSTDTQYF
0.455230 NaN 545.0 139 138 0.000000 77 0.10 gex_cluster_vs_tcr_nbr 1041.0 0 3 TRAV1-2*01 TRAJ9*01 CAVRETGGFKTIF TRBV3-1*01 TRBJ2-5*01 CASSQASGGRETQYF
0.466759 545.0 NaN 117 117 0.000000 1005 0.10 gex_nbr_vs_tcr_cluster 852.0 8 0 TRAV13-2*01 TRAJ3*01 CAEKMRGSSASKIIF TRBV5-5*01 TRBJ2-3*01 CASSGGGWADTQYF
0.499536 NaN 54.0 11 9 0.000000 3575 0.01 gex_cluster_vs_tcr_nbr 201.0 9 5 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV6-6*01 TRBJ1-2*01 CASSKRGDYGYTF
0.499536 NaN 54.0 11 9 0.000000 3573 0.01 gex_cluster_vs_tcr_nbr 201.0 9 5 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV6-2*01 TRBJ1-2*01 CASSPTRGALVGYTF
0.499536 NaN 54.0 11 9 0.000000 3567 0.01 gex_cluster_vs_tcr_nbr 201.0 9 5 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV11-2*01 TRBJ1-2*01 CASSPSRGSLGGYTF
0.506932 545.0 545.0 85 80 0.000000 3577 0.10 gex_nbr_vs_tcr_nbr NaN 6 5 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV7-8*01 TRBJ1-2*01 CASSPRQGAINGYTF
0.515636 545.0 NaN 74 70 0.000000 2440 0.10 gex_nbr_vs_tcr_cluster 461.0 5 5 TRAV25*01 TRAJ21*01 CAATYNFNKFYF TRBV6-1*01 TRBJ2-1*01 CASSLTREQFF
0.569272 NaN 545.0 95 93 0.000000 3439 0.10 gex_cluster_vs_tcr_nbr 652.0 2 5 TRAV35*01 TRAJ13*01 CAGQNSGGYQKVTF TRBV27*01 TRBJ1-5*01 CASSLYGYRGFGQPQHF
Omitted 39 lines

graph_vs_graph_logos


This figure summarizes the results of a CoNGA analysis that produces scores (CoNGA) and clusters. At the top are six 2D UMAP projections of clonotypes in the dataset based on GEX similarity (top left three panels) and TCR similarity (top right three panels), colored from left to right by GEX cluster assignment; CoNGA score; joint GEX:TCR cluster assignment for clonotypes with significant CoNGA scores, using a bicolored disk whose left half indicates GEX cluster and whose right half indicates TCR cluster; TCR cluster; CoNGA; GEX:TCR cluster assignments for CoNGA hits, as in the third panel.

Below are two rows of GEX landscape plots colored by (first row, left) expression of selected marker genes, (second row, left) Z-score normalized and GEX-neighborhood averaged expression of the same marker genes, and (both rows, right) TCR sequence features (see CoNGA manuscript Table S3 for TCR feature descriptions).

GEX and TCR sequence features of CoNGA hits in clusters with 5 or more hits are summarized by a series of logo-style visualizations, from left to right: differentially expressed genes (DEGs); TCR sequence logos showing the V and J gene usage and CDR3 sequences for the TCR alpha and beta chains; biased TCR sequence scores, with red indicating elevated scores and blue indicating decreased scores relative to the rest of the dataset (see CoNGA manuscript Table S3 for score definitions); GEX 'logos' for each cluster consisting of a panel of marker genes shown with red disks colored by mean expression and sized according to the fraction of cells expressing the gene (gene names are given above).

DEG and TCRseq sequence logos are scaled by the adjusted P value of the associations, with full logo height requiring a top adjusted P value below 10-6. DEGs with fold-change less than 2 are shown in gray. Each cluster is indicated by a bicolored disk colored according to GEX cluster (left half) and TCR cluster (right half). The two numbers above each disk show the number of hits within the cluster (on the left) and the total number of cells in those clonotypes (on the right). The dendrogram at the left shows similarity relationships among the clusters based on connections in the GEX and TCR neighbor graphs.

The choice of which marker genes to use for the GEX umap panels and for the cluster GEX logos can be configured using run_conga.py command line flags or arguments to the conga.plotting.make_logo_plots function.
Image source: ./CoNGA.output_graph_vs_graph_logos.png

tcr_clumping


This table stores the results of the TCR "clumping" analysis, which looks for neighborhoods in TCR space with more TCRs than expected by chance under a simple null model of VDJ rearrangement.

For each TCR in the dataset, we count how many TCRs are within a set of fixed TCRdist radii (defaults: 24,48,72,96), and compare that number to the expected number given the size of the dataset using the poisson model. Inspired by the ALICE and TCRnet methods.

Columns: clump_type='global' unless we are optionally looking for TCR clumps within the individual GEX clusters num_nbrs = neighborhood size (number of other TCRs with TCRdist

clump_type clone_index nbr_radius pvalue_adj num_nbrs expected_num_nbrs raw_count va ja cdr3a vb jb cdr3b clonotype_fdr_value clumping_group clusters_gex clusters_tcr
global 2537 96 1.162056e-29 19 1.415972e-01 64929.00 TRAV25*01 TRAJ54*01 CGAGAQKLVF TRBV20-1*01 TRBJ2-2*01 CSAAPRATGELFF 6.082732e-30 2 9 2
global 1355 48 1.216546e-29 9 8.374272e-04 384.00 TRAV17*01 TRAJ10*01 CATGDTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSGGGGNQPQHF 6.082732e-30 3 2 9
global 2515 96 1.942826e-29 18 1.107992e-01 50816.00 TRAV25*01 TRAJ54*01 CAAGAQKLVF TRBV20-1*01 TRBJ1-2*01 CSAHESGNGYTF 6.476087e-30 2 12 2
global 1354 48 1.152596e-28 9 1.075134e-03 493.00 TRAV17*01 TRAJ10*01 CATGDTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSIGSGNQPQHF 2.881490e-29 3 2 9
global 1362 48 6.535164e-28 8 4.317984e-04 198.00 TRAV17*01 TRAJ10*01 CATTFTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSIGAGNQPQHF 1.307033e-28 3 6 9
global 2533 96 2.133310e-27 18 1.441005e-01 66089.00 TRAV25*01 TRAJ54*01 CAQGAQKLVF TRBV20-1*01 TRBJ1-2*01 CSVSGTGTGGYTF 3.555516e-28 2 2 2
global 2542 96 5.573613e-27 17 1.145901e-01 52545.00 TRAV25*01 TRAJ54*01 CGQGAQKLVF TRBV20-1*01 TRBJ1-1*01 CSAGREGAFF 7.962304e-28 2 5 2
global 1356 48 8.359958e-27 7 1.526560e-04 70.00 TRAV17*01 TRAJ10*01 CATGFTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSGGAGHQPQHF 1.044995e-27 3 2 9
global 2524 96 1.081259e-26 18 1.578114e-01 72364.00 TRAV25*01 TRAJ54*01 CAGGAQKLVF TRBV20-1*01 TRBJ1-2*01 CSAGGSGSYGYTF 1.201399e-27 2 6 2
global 3578 96 2.363396e-26 12 1.685885e-02 7732.00 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV6-1*01 TRBJ1-2*01 CASSAKTGALSGYTF 2.363396e-27 1 2 5
global 2539 96 2.818391e-26 19 2.141785e-01 98211.00 TRAV25*01 TRAJ54*01 CGGGAQKLVF TRBV20-1*01 TRBJ2-1*01 CSASSSAYNEQFF 2.562174e-27 2 6 2
global 2529 96 5.095945e-26 15 6.826340e-02 31302.00 TRAV25*01 TRAJ54*01 CAKGAQKLVF TRBV20-1*01 TRBJ1-2*01 CSAFERTANYGYTF 4.246621e-27 2 12 2
global 2531 96 6.991612e-26 19 2.247881e-01 103076.00 TRAV25*01 TRAJ54*01 CAKGAQKLVF TRBV20-1*01 TRBJ2-1*01 CSSGGTAYNEQFF 5.378163e-27 2 9 2
global 2541 96 1.008472e-25 18 1.788518e-01 82012.00 TRAV25*01 TRAJ54*01 CGQGAQKLVF TRBV20-1*01 TRBJ1-1*01 CSAQGWGNTEAFF 7.203374e-27 2 5 2
global 1356 72 1.532959e-25 10 5.506520e-03 2525.00 TRAV17*01 TRAJ10*01 CATGFTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSGGAGHQPQHF 1.044995e-27 3 2 9
global 1364 48 4.903497e-25 8 9.879024e-04 453.00 TRAV17*01 TRAJ10*01 CATVLTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSRGAGNQPQHF 3.064686e-26 3 2 9
global 2530 96 6.218345e-25 19 2.525388e-01 115801.00 TRAV25*01 TRAJ54*01 CAKGAQKLVF TRBV20-1*01 TRBJ2-1*01 CSARSTAYNEQFF 3.657850e-26 2 12 2
global 2520 96 1.707622e-24 15 8.636883e-02 39626.00 TRAV25*01 TRAJ54*01 CAGGAQKLVF TRBV20-1*01 TRBJ1-2*01 CSALRQGWSGYTF 9.486791e-26 2 9 2
global 1353 72 1.818472e-24 10 7.052707e-03 3234.00 TRAV17*01 TRAJ10*01 CATGATGGGNKLTF TRBV19*01 TRBJ1-5*01 CASTLGGGNQPQHF 9.570904e-26 3 3 9
global 2538 96 1.948419e-24 16 1.209384e-01 55456.00 TRAV25*01 TRAJ54*01 CGGGAQKLVF TRBV20-1*01 TRBJ1-3*01 CSAPISGNTIYF 9.742097e-26 2 5 2
global 1362 72 7.919571e-24 10 8.171458e-03 3747.00 TRAV17*01 TRAJ10*01 CATTFTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSIGAGNQPQHF 1.307033e-28 3 6 9
global 1361 72 9.670335e-24 9 3.790230e-03 1738.00 TRAV17*01 TRAJ10*01 CATGTTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASGGAGNQPQHF 4.395607e-25 3 2 9
global 2534 96 1.923645e-23 17 1.857723e-01 85201.00 TRAV25*01 TRAJ54*01 CAQGAQKLVF TRBV20-1*01 TRBJ2-2*01 CSASQSGTGELFF 8.363672e-25 2 5 2
global 2522 96 2.519143e-23 16 1.420947e-01 65193.00 TRAV25*01 TRAJ54*01 CAGGAQKLVF TRBV20-1*01 TRBJ2-1*01 CSAVLTSVDEQFF 1.049643e-24 2 6 2
global 3517 96 3.028393e-23 12 3.063806e-02 14049.00 TRAV35*01 TRAJ42*01 CAALNYGGSQGNLIF TRBV6-1*01 TRBJ1-2*01 CASSENSGSLYGYTF 1.211357e-24 1 5 5
global 2524 72 3.548148e-23 11 1.804394e-02 8274.00 TRAV25*01 TRAJ54*01 CAGGAQKLVF TRBV20-1*01 TRBJ1-2*01 CSAGGSGSYGYTF 1.201399e-27 2 6 2
global 1358 48 3.944302e-23 8 1.709747e-03 784.00 TRAV17*01 TRAJ10*01 CATGLTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSMGGGNQPQHF 1.460853e-24 3 6 9
global 1347 48 5.539103e-23 8 1.783894e-03 818.00 TRAV17*01 TRAJ10*01 CAPGLTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSPGGGNQPQHF 1.978251e-24 3 6 9
global 1355 72 7.901115e-23 10 1.028683e-02 4717.00 TRAV17*01 TRAJ10*01 CATGDTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSGGGGNQPQHF 6.082732e-30 3 2 9
global 1355 24 8.029850e-23 4 5.452000e-07 0.25 TRAV17*01 TRAJ10*01 CATGDTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSGGGGNQPQHF 6.082732e-30 3 2 9
global 2521 96 8.221067e-23 15 1.120009e-01 51386.00 TRAV25*01 TRAJ54*01 CAGGAQKLVF TRBV20-1*01 TRBJ1-2*01 CSAQGVEGYGYTF 2.651957e-24 2 9 2
global 1360 48 1.345556e-22 8 1.993251e-03 914.00 TRAV17*01 TRAJ10*01 CATGLTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSRGGGNQPQHF 4.204862e-24 3 2 9
global 2517 96 1.453489e-22 16 1.586990e-01 72771.00 TRAV25*01 TRAJ54*01 CAAGAQKLVF TRBV20-1*01 TRBJ2-3*01 CSASHTQYF 4.404513e-24 2 2 2
global 1364 72 1.813274e-22 10 1.117878e-02 5126.00 TRAV17*01 TRAJ10*01 CATVLTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSRGAGNQPQHF 3.064686e-26 3 2 9
global 1346 72 1.563446e-21 9 6.671067e-03 3059.00 TRAV17*01 TRAJ10*01 CAMPRVGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSIGAGNQPQHF 4.466989e-23 3 5 9
global 1354 72 1.722919e-21 10 1.400510e-02 6422.00 TRAV17*01 TRAJ10*01 CATGDTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSIGSGNQPQHF 2.881490e-29 3 2 9
global 3547 96 1.647751e-20 11 3.156933e-02 14484.00 TRAV35*01 TRAJ42*01 CAGLNYGGSQGNLIF TRBV6-1*01 TRBJ1-2*01 CASSGRQGALYGYTF 4.453382e-22 1 6 5
global 3223 24 1.694111e-20 5 3.925440e-05 18.00 TRAV29/DV5*01 TRAJ54*01 CAARWEGAQKLVF TRBV20-1*01 TRBJ2-3*01 CSSPSTDTQYF 4.458186e-22 5 5 2
global 1358 72 2.213439e-20 10 1.808537e-02 8293.00 TRAV17*01 TRAJ10*01 CATGLTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSMGGGNQPQHF 1.460853e-24 3 6 9
global 1347 72 2.364439e-20 10 1.820532e-02 8348.00 TRAV17*01 TRAJ10*01 CAPGLTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSPGGGNQPQHF 1.978251e-24 3 6 9
global 1360 72 2.742635e-20 10 1.847792e-02 8473.00 TRAV17*01 TRAJ10*01 CATGLTGGGNKLTF TRBV19*01 TRBJ1-5*01 CASSRGGGNQPQHF 4.204862e-24 3 2 9
global 2516 96 2.695180e-19 14 1.410086e-01 64671.00 TRAV25*01 TRAJ54*01 CAAGAQKLVF TRBV20-1*01 TRBJ2-1*01 CSAFPGTAYNEQFF 6.417096e-21 2 2 2
global 2519 96 4.280827e-19 13 1.022995e-01 46935.00 TRAV25*01 TRAJ54*01 CAGGAQKLVF TRBV20-1*01 TRBJ1-2*01 CSAEAGGANYGYTF 9.918158e-21 2 9 2
global 2539 72 4.363989e-19 10 2.438134e-02 11180.00 TRAV25*01 TRAJ54*01 CGGGAQKLVF TRBV20-1*01 TRBJ2-1*01 CSASSSAYNEQFF 2.562174e-27 2 6 2
global 2523 96 1.541037e-18 13 1.129785e-01 51806.00 TRAV25*01 TRAJ54*01 CAGGAQKLVF TRBV20-1*01 TRBJ1-2*01 CSANDRGGSYGYTF 3.424526e-20 2 6 2
global 3573 96 7.434183e-18 10 3.239731e-02 14883.00 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV6-2*01 TRBJ1-2*01 CASSPTRGALVGYTF 1.616127e-19 1 9 5
global 2535 96 9.201615e-18 13 1.297816e-01 59511.00 TRAV25*01 TRAJ54*01 CAQGAQKLVF TRBV20-1*01 TRBJ2-3*01 CSAPTRTSTDTQYF 1.957791e-19 2 7 2
global 3516 96 2.320547e-17 9 1.942002e-02 8905.00 TRAV35*01 TRAJ42*01 CAALNYGGSQGNLIF TRBV6-1*01 TRBJ1-2*01 CASGGQGKLDGYTF 4.834473e-19 1 2 5
global 2515 72 4.857773e-17 8 9.877212e-03 4530.00 TRAV25*01 TRAJ54*01 CAAGAQKLVF TRBV20-1*01 TRBJ1-2*01 CSAHESGNGYTF 6.476087e-30 2 12 2
global 3541 96 7.721425e-17 8 1.046696e-02 4804.00 TRAV35*01 TRAJ42*01 CAGLNYGGSQGNLIF TRBV6-1*01 TRBJ1-2*01 CASIGTRGALRGYTF 1.544285e-18 1 5 5
Omitted 297 lines

tcr_clumping_logos


This figure summarizes the results of a CoNGA analysis that produces scores (TCR clumping) and clusters. At the top are six 2D UMAP projections of clonotypes in the dataset based on GEX similarity (top left three panels) and TCR similarity (top right three panels), colored from left to right by GEX cluster assignment; TCR clumping score; joint GEX:TCR cluster assignment for clonotypes with significant TCR clumping scores, using a bicolored disk whose left half indicates GEX cluster and whose right half indicates TCR cluster; TCR cluster; TCR clumping; GEX:TCR cluster assignments for TCR clumping hits, as in the third panel.

Below are two rows of GEX landscape plots colored by (first row, left) expression of selected marker genes, (second row, left) Z-score normalized and GEX-neighborhood averaged expression of the same marker genes, and (both rows, right) TCR sequence features (see CoNGA manuscript Table S3 for TCR feature descriptions).

GEX and TCR sequence features of TCR clumping hits in clusters with 3 or more hits are summarized by a series of logo-style visualizations, from left to right: differentially expressed genes (DEGs); TCR sequence logos showing the V and J gene usage and CDR3 sequences for the TCR alpha and beta chains; biased TCR sequence scores, with red indicating elevated scores and blue indicating decreased scores relative to the rest of the dataset (see CoNGA manuscript Table S3 for score definitions); GEX 'logos' for each cluster consisting of a panel of marker genes shown with red disks colored by mean expression and sized according to the fraction of cells expressing the gene (gene names are given above).

DEG and TCRseq sequence logos are scaled by the adjusted P value of the associations, with full logo height requiring a top adjusted P value below 10-6. DEGs with fold-change less than 2 are shown in gray. Each cluster is indicated by a bicolored disk colored according to GEX cluster (left half) and TCR cluster (right half). The two numbers above each disk show the number of hits within the cluster (on the left) and the total number of cells in those clonotypes (on the right). The dendrogram at the left shows similarity relationships among the clusters based on connections in the GEX and TCR neighbor graphs.

The choice of which marker genes to use for the GEX umap panels and for the cluster GEX logos can be configured using run_conga.py command line flags or arguments to the conga.plotting.make_logo_plots function.
Image source: ./CoNGA.output_tcr_clumping_logos.png

tcr_db_match


This table stores significant matches between TCRs in adata and TCRs in the file /Users/Nick/conga/conga/data/new_paired_tcr_db_for_matching_nr.tsv

P values of matches are assigned by turning the raw TCRdist score into a P value based on a model of the V(D)J rearrangement process, so matches between TCRs that are very far from germline (for example) are assigned a higher significance.

Columns:

tcrdist: TCRdist distance between the two TCRs (adata query and db hit)

pvalue_adj: raw P value of the match * num query TCRs * num db TCRs

fdr_value: Benjamini-Hochberg FDR value for match

clone_index: index within adata of the query TCR clonotype

db_index: index of the hit in the database being matched

va,ja,cdr3a,vb,jb,cdr3b

db_XXX: where XXX is a field in the literature database


tcrdist pvalue_adj fdr_value clone_index db_index va cdr3a vb cdr3b ja jb db_cdr3a db_cdr3b db_dataset db_epitope db_epitope_gene db_epitope_species db_ja db_jb db_mhc db_va db_vb db_pmhc db_donor db_is_second_alpha_chain db_db db_mhc_trim barcode
60 0.323830 0.15292 2825 2766 TRAV26-2*01 CILWNAGGTSYGKLTF TRBV29-1*01 CSAEREHNEQFF TRAJ52*01 TRBJ2-1*01 CILPLAGGTSYGKLTF CSVEYRDNEQFF vdjdb:PMID:28636589 NLVPMVATV pp65 CMV TRAJ52*01 TRBJ2-1*01 A*02:01 TRAV26-2*01 TRBV29-1*01 NaN NaN NaN vdjdb A*02 CACAGTATCCAAGTAC-1-6
57 0.584692 0.15292 2475 925 TRAV25*01 CAGPRGNTGKLIF TRBV19*01 CASSWTSNQPQHF TRAJ37*01 TRBJ1-5*01 CAGPGSNTGKLIF CASSIRSSQPQHF vdjdb:PMID:29483513 GILGFVFTL M InfluenzaA TRAJ37*01 TRBJ1-5*01 A*02:01 TRAV25*01 TRBV19*01 NaN NaN NaN vdjdb A*02 ACACCGGCAGGTGGAT-1-6
64 0.611678 0.15292 3325 2667 TRAV3*01 CAVRNNNARLMF TRBV12-3*01 CASSRTEGSEAFF TRAJ31*01 TRBJ1-1*01 CAVRNNNARLMF CASSIVNEAFF vdjdb:PMID:28636592 NLVPMVATV pp65 CMV TRAJ31*01 TRBJ1-1*01 A*02:01 TRAV3*01 TRBV12-4*01 NaN NaN NaN vdjdb A*02 GACTGCGTCCGTACAA-1-16
64 0.611678 0.15292 3325 2668 TRAV3*01 CAVRNNNARLMF TRBV12-3*01 CASSRTEGSEAFF TRAJ31*01 TRBJ1-1*01 CAVRNNNARLMF CASSVVNEAFF vdjdb:PMID:28636592 NLVPMVATV pp65 CMV TRAJ31*01 TRBJ1-1*01 A*02:01 TRAV3*01 TRBV12-4*01 NaN NaN NaN vdjdb A*02 GACTGCGTCCGTACAA-1-16

tcr_db_match_plot


GEX and TCR UMAPs showing the location of significant matches to the literature TCR database
Image source: ./CoNGA.output_tcr_db_match_plot.png

tcr_graph_vs_gex_features


This table has results from a graph-vs-features analysis in which we look for genes that are differentially expressed (elevated) in specific neighborhoods of the TCR neighbor graph. Differential expression is assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a gene.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons log2enr = log2 fold change of gene in neighborhood (will be positive) gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the gene mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.


ttest_pvalue_adj mwu_pvalue_adj log2enr gex_cluster tcr_cluster feature mean_fg mean_bg num_fg clone_index mait_fraction nbr_frac graph_type feature_type
6.434513e-47 6.581906e-150 4.135562 2 11 EPHB6 1.600675 0.203020 174 -1 0.0 0.0 tcr_cluster gex
7.213954e-25 4.616238e-56 2.225136 0 11 EPHB6 0.702235 0.197033 546 64 0.0 0.1 tcr_nbr gex
4.331438e-24 1.797165e-53 2.220297 0 11 EPHB6 0.700948 0.197176 546 5292 0.0 0.1 tcr_nbr gex
8.827140e-24 6.510749e-53 2.195481 0 11 EPHB6 0.694362 0.197909 546 1633 0.0 0.1 tcr_nbr gex
1.148688e-23 1.268277e-52 2.189814 0 11 EPHB6 0.692861 0.198076 546 4084 0.0 0.1 tcr_nbr gex
2.458815e-23 4.185176e-52 2.175033 0 11 EPHB6 0.688956 0.198510 546 4955 0.0 0.1 tcr_nbr gex
2.511826e-23 2.034006e-51 2.176361 0 11 EPHB6 0.689306 0.198471 546 5127 0.0 0.1 tcr_nbr gex
4.588023e-23 4.254087e-51 2.162021 0 11 EPHB6 0.685525 0.198892 546 2594 0.0 0.1 tcr_nbr gex
7.188065e-23 5.630096e-50 2.180852 0 11 EPHB6 0.690492 0.198339 546 2577 0.0 0.1 tcr_nbr gex
1.023740e-22 7.253434e-50 2.148762 0 11 EPHB6 0.682038 0.199280 546 5133 0.0 0.1 tcr_nbr gex
2.826941e-22 9.040627e-50 2.124762 0 11 EPHB6 0.675746 0.199980 546 2184 0.0 0.1 tcr_nbr gex
2.526717e-22 3.156969e-49 2.133865 0 11 EPHB6 0.678130 0.199715 546 583 0.0 0.1 tcr_nbr gex
3.378466e-22 6.635245e-49 2.131313 0 11 EPHB6 0.677461 0.199789 546 4017 0.0 0.1 tcr_nbr gex
5.553557e-22 6.649064e-49 2.112105 0 11 EPHB6 0.672439 0.200348 546 4953 0.0 0.1 tcr_nbr gex
3.471103e-22 2.174351e-48 2.158942 0 11 EPHB6 0.684715 0.198982 546 4625 0.0 0.1 tcr_nbr gex
5.886371e-22 4.522480e-48 2.140555 0 11 EPHB6 0.679883 0.199520 546 4441 0.0 0.1 tcr_nbr gex
6.591540e-22 5.131109e-48 2.129442 0 11 EPHB6 0.676971 0.199844 546 4972 0.0 0.1 tcr_nbr gex
4.695350e-22 5.155877e-48 2.136192 0 11 EPHB6 0.678740 0.199647 546 2714 0.0 0.1 tcr_nbr gex
1.304387e-21 3.472780e-47 2.111536 0 11 EPHB6 0.672290 0.200364 546 1815 0.0 0.1 tcr_nbr gex
1.510764e-21 8.170211e-47 2.126568 0 11 EPHB6 0.676219 0.199927 546 4508 0.0 0.1 tcr_nbr gex
7.333510e-22 8.251196e-47 2.157863 0 11 EPHB6 0.684431 0.199014 546 1592 0.0 0.1 tcr_nbr gex
1.288987e-21 1.768490e-46 2.126255 0 11 EPHB6 0.676137 0.199936 546 1751 0.0 0.1 tcr_nbr gex
6.304879e-21 2.886066e-46 2.033353 0 11 EPHB6 0.652030 0.202619 546 4193 0.0 0.1 tcr_nbr gex
4.202120e-21 3.422495e-46 2.087385 0 11 EPHB6 0.666001 0.201064 546 1269 0.0 0.1 tcr_nbr gex
3.592607e-21 3.849752e-46 2.098357 0 11 EPHB6 0.668855 0.200747 546 3278 0.0 0.1 tcr_nbr gex
2.292967e-21 4.492491e-46 2.118375 0 11 EPHB6 0.674076 0.200166 546 4341 0.0 0.1 tcr_nbr gex
2.592262e-21 9.158380e-46 2.118334 0 11 EPHB6 0.674066 0.200167 546 4200 0.0 0.1 tcr_nbr gex
3.334482e-21 1.421634e-45 2.116450 0 11 EPHB6 0.673573 0.200222 546 1655 0.0 0.1 tcr_nbr gex
3.068444e-21 1.658153e-45 2.116613 0 11 EPHB6 0.673616 0.200217 546 1723 0.0 0.1 tcr_nbr gex
7.541337e-21 2.371486e-45 2.103075 0 11 EPHB6 0.670084 0.200610 546 4424 0.0 0.1 tcr_nbr gex
8.358473e-21 2.764192e-45 2.079882 0 11 EPHB6 0.664053 0.201281 546 4642 0.0 0.1 tcr_nbr gex
5.980132e-21 4.411499e-45 2.098973 0 11 EPHB6 0.669015 0.200729 546 4583 0.0 0.1 tcr_nbr gex
8.853551e-21 4.724602e-45 2.078449 0 11 EPHB6 0.663681 0.201322 546 152 0.0 0.1 tcr_nbr gex
4.626546e-21 8.154887e-45 2.116880 0 11 EPHB6 0.673686 0.200209 546 3647 0.0 0.1 tcr_nbr gex
9.235551e-21 1.550306e-44 2.087915 0 11 EPHB6 0.666139 0.201049 546 3334 0.0 0.1 tcr_nbr gex
9.246077e-21 1.982236e-44 2.104017 0 11 EPHB6 0.670330 0.200583 546 4456 0.0 0.1 tcr_nbr gex
1.067582e-20 9.269751e-44 2.103080 0 11 EPHB6 0.670085 0.200610 546 4787 0.0 0.1 tcr_nbr gex
1.728937e-20 1.164808e-43 2.098521 0 11 EPHB6 0.668898 0.200742 546 3661 0.0 0.1 tcr_nbr gex
3.546118e-20 1.989679e-43 2.044120 0 11 EPHB6 0.654803 0.202310 546 2170 0.0 0.1 tcr_nbr gex
3.666719e-20 2.000183e-43 2.045733 0 11 EPHB6 0.655219 0.202264 546 4347 0.0 0.1 tcr_nbr gex
6.372731e-20 5.427646e-43 2.028070 0 11 EPHB6 0.650672 0.202770 546 4087 0.0 0.1 tcr_nbr gex
3.466266e-20 1.275679e-42 2.075526 0 11 EPHB6 0.662923 0.201407 546 2411 0.0 0.1 tcr_nbr gex
3.894790e-20 1.612408e-42 2.079921 0 11 EPHB6 0.664063 0.201280 546 499 0.0 0.1 tcr_nbr gex
7.704665e-20 1.772291e-42 2.060761 0 11 EPHB6 0.659100 0.201832 546 4594 0.0 0.1 tcr_nbr gex
9.574264e-20 3.048483e-42 2.052773 0 11 EPHB6 0.657036 0.202062 546 3319 0.0 0.1 tcr_nbr gex
5.213354e-20 4.569599e-42 2.082826 0 11 EPHB6 0.664817 0.201196 546 2703 0.0 0.1 tcr_nbr gex
6.460939e-20 5.411154e-42 2.069995 0 11 EPHB6 0.661489 0.201566 546 4196 0.0 0.1 tcr_nbr gex
9.567885e-20 8.089639e-42 2.046581 0 11 EPHB6 0.655438 0.202240 546 5271 0.0 0.1 tcr_nbr gex
7.448545e-20 8.155377e-42 2.056720 0 11 EPHB6 0.658055 0.201948 546 2123 0.0 0.1 tcr_nbr gex
8.510634e-20 9.874114e-42 2.074825 0 11 EPHB6 0.662741 0.201427 546 2728 0.0 0.1 tcr_nbr gex
Omitted 1129 lines

tcr_graph_vs_gex_features_plot


This plot summarizes the results of a graph versus features analysis by labeling the clonotypes at the center of each biased neighborhood with the name of the feature biased in that neighborhood. The feature names are drawn in colored boxes whose color is determined by the strength and direction of the feature score bias (from bright red for features that are strongly elevated to bright blue for features that are strongly decreased in the corresponding neighborhoods, relative to the rest of the dataset).

At most one feature (the top scoring) is shown for each clonotype (ie, neighborhood). The UMAP xy coordinates for this plot are stored in adata.obsm['X_tcr_2d']. The score used for ranking correlations is 'mwu_pvalue_adj'. The threshold score for displaying a feature is 1.0. The feature column is 'feature'. Since we also run graph-vs-features using "neighbor" graphs that are defined by clusters, ie where each clonotype is connected to all the other clonotypes in the same cluster, some biased features may be associated with a cluster rather than a specific clonotype. Those features are labeled with a '*' at the end and shown near the centroid of the clonotypes belonging to that cluster.
Image source: ./CoNGA.output_tcr_graph_vs_gex_features_plot.png

tcr_graph_vs_gex_features_panels


Graph-versus-feature analysis was used to identify a set of GEX features that showed biased distributions in TCR neighborhoods. This plot shows the distribution of the top-scoring GEX features on the TCR UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: ./CoNGA.output_tcr_graph_vs_gex_features_panels.png

tcr_genes_vs_gex_features


This table has results from a graph-vs-features analysis in which we look for genes that are differentially expressed (elevated) in specific neighborhoods of the TCR neighbor graph. Differential expression is assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a gene.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons log2enr = log2 fold change of gene in neighborhood (will be positive) gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the gene mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.

In this analysis the TCR graph is defined by connecting all clonotypes that have the same VA/JA/VB/JB-gene segment (it's run four times, once with each gene segment type)
ttest_pvalue_adj mwu_pvalue_adj log2enr gex_cluster tcr_cluster feature mean_fg mean_bg num_fg clone_index mait_fraction gene_segment graph_type feature_type
2.160722e-55 1.102534e-176 4.180570 0 11 EPHB6 1.586726 0.194247 209 -1 0.0 TRBV30 tcr_genes gex
1.246850e-09 2.389501e-15 1.582878 5 5 TOX2 1.383366 0.691917 209 -1 0.0 TRAV35 tcr_genes gex
1.399017e-05 2.098363e-10 1.338706 5 5 PDCD1 1.144191 0.613055 209 -1 0.0 TRAV35 tcr_genes gex
1.726945e-06 1.762830e-09 1.063651 9 5 ITM2A 2.136733 1.520522 209 -1 0.0 TRAV35 tcr_genes gex
9.971771e-05 1.960829e-09 0.472631 2 5 PFN1 3.250053 2.937367 209 -1 0.0 TRAV35 tcr_genes gex
6.372944e-01 1.466391e-08 1.523958 2 5 POU2AF1 0.353385 0.137495 209 -1 0.0 TRAV35 tcr_genes gex
3.738305e-01 7.032901e-08 1.447604 5 5 GFOD1 0.436406 0.182819 209 -1 0.0 TRAV35 tcr_genes gex
3.964167e-07 9.195374e-08 0.664510 9 5 GAPDH 2.714607 2.292019 209 -1 0.0 TRAV35 tcr_genes gex
1.621381e+00 2.018497e-07 1.504097 5 5 CPM 0.347650 0.136773 209 -1 0.0 TRAV35 tcr_genes gex
4.861575e-04 5.318717e-07 1.075764 5 5 SH2D1A 1.011538 0.604399 209 -1 0.0 TRAV35 tcr_genes gex
8.640144e-04 7.642893e-07 0.710713 2 5 COTL1 1.977713 1.569515 209 -1 0.0 TRAV35 tcr_genes gex
1.562692e-03 2.259247e-06 1.070212 2 5 TOX 0.933333 0.550914 209 -1 0.0 TRAV35 tcr_genes gex
5.629991e-01 3.654609e-06 1.219031 2 5 IGFBP4 0.569600 0.284969 209 -1 0.0 TRAV35 tcr_genes gex
3.310230e-01 4.265566e-06 1.311902 6 5 LIMS2 0.464947 0.213839 209 -1 0.0 TRAV35 tcr_genes gex
1.667539e-01 1.070222e-05 1.106745 2 5 ZNRF1 0.555790 0.296507 209 -1 0.0 TRAV35 tcr_genes gex
5.524469e-01 4.223627e-05 1.125500 2 5 AC004585.1 0.572149 0.302970 209 -1 0.0 TRAV35 tcr_genes gex
8.470788e+00 1.545289e-04 1.462360 2 5 AC055839.2 0.298524 0.118885 209 -1 0.0 TRAV35 tcr_genes gex
1.961510e-01 1.814727e-04 1.078025 5 5 AC012645.3 0.709090 0.398039 209 -1 0.0 TRAV35 tcr_genes gex
3.478450e-01 1.511312e-03 0.967769 2 5 TBC1D4 0.750614 0.452211 209 -1 0.0 TRAV35 tcr_genes gex
7.312942e-01 2.226700e-03 0.661492 2 5 RGS10 1.334872 1.018817 209 -1 0.0 TRAV35 tcr_genes gex
5.922762e-02 3.674933e-03 0.820943 9 5 C9orf16 1.249345 0.878971 209 -1 0.0 TRAV35 tcr_genes gex
2.081965e+00 5.159438e-03 0.990852 5 5 CARHSP1 0.502462 0.284027 209 -1 0.0 TRAV35 tcr_genes gex
4.160681e-01 1.193812e-02 0.398563 5 5 PTPRCAP 2.507068 2.256410 209 -1 0.0 TRAV35 tcr_genes gex
3.405594e-01 2.007561e-02 0.716250 5 5 PPP1CC 1.380353 1.033757 209 -1 0.0 TRAV35 tcr_genes gex
1.281349e-01 2.886413e-02 0.470399 5 2 C9orf16 1.070960 0.868979 653 -1 0.0 TRBV20-1 tcr_genes gex
3.953287e+00 3.154649e-02 0.449065 5 5 GNAS 2.035280 1.770614 209 -1 0.0 TRAV35 tcr_genes gex
1.325582e+00 3.619827e-02 0.415988 2 5 RAC2 2.293132 2.037972 209 -1 0.0 TRAV35 tcr_genes gex
6.295522e-01 4.773997e-02 0.750620 2 5 CTSA 0.821483 0.563677 209 -1 0.0 TRAV35 tcr_genes gex
5.932935e-01 6.907922e-02 0.841067 9 5 RNF19A 1.027858 0.694179 209 -1 0.0 TRAV35 tcr_genes gex
3.437126e-03 1.013135e-01 0.348889 2 5 ACTG1 3.297982 3.066210 209 -1 0.0 TRAV35 tcr_genes gex
2.173031e-01 1.255302e-01 0.287927 5 2 GAPDH 2.467558 2.286538 653 -1 0.0 TRBV20-1 tcr_genes gex
5.607080e+00 1.337847e-01 0.868572 5 5 TIGIT 0.869745 0.564894 209 -1 0.0 TRAV35 tcr_genes gex
8.509043e+00 1.686480e-01 0.479211 2 5 ARPC3 1.550212 1.298341 209 -1 0.0 TRAV35 tcr_genes gex
3.517686e+00 1.739375e-01 0.896941 6 5 TOX2 1.063910 0.702651 238 -1 0.0 TRAJ42 tcr_genes gex
3.875348e-01 1.848713e-01 0.604946 9 5 GYPC 1.437789 1.135094 209 -1 0.0 TRAV35 tcr_genes gex
3.873605e+00 1.943827e-01 0.679721 0 0 EPHB6 0.350549 0.232780 687 -1 0.0 TRBJ2-7 tcr_genes gex
2.914961e+00 2.918327e-01 0.555523 2 2 TOX2 0.904109 0.693157 653 -1 0.0 TRBV20-1 tcr_genes gex
2.844448e+00 3.022383e-01 0.245505 2 5 ACTB 4.431882 4.263914 209 -1 0.0 TRAV35 tcr_genes gex
6.729548e-01 3.244829e-01 0.441047 2 5 LAT 1.993099 1.734968 209 -1 0.0 TRAV35 tcr_genes gex
4.504267e+00 3.321344e-01 0.392293 5 5 MYL6 2.089363 1.855391 209 -1 0.0 TRAV35 tcr_genes gex
5.484722e-01 4.005536e-01 0.654085 2 5 PARK7 0.996310 0.735042 209 -1 0.0 TRAV35 tcr_genes gex
4.194196e+00 6.110817e-01 0.727211 9 5 HIF1A 0.978077 0.694325 209 -1 0.0 TRAV35 tcr_genes gex
6.821756e-02 2.206058e+00 0.423667 5 5 LIMD2 2.125738 1.872003 209 -1 0.0 TRAV35 tcr_genes gex

tcr_genes_vs_gex_features_panels


Graph-versus-feature analysis was used to identify a set of GEX features that showed biased distributions in TCR neighborhoods. This plot shows the distribution of the top-scoring GEX features on the TCR UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: ./CoNGA.output_tcr_genes_vs_gex_features_panels.png

gex_graph_vs_tcr_features


This table has results from a graph-vs-features analysis in which we look at the distribution of a set of TCR-defined features over the GEX neighbor graph. We look for neighborhoods in the graph that have biased score distributions, as assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a tcr feature.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons ttest_stat= ttest statistic (sign indicates where feature is up or down) mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the TCR score mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.


ttest_pvalue_adj ttest_stat mwu_pvalue_adj gex_cluster tcr_cluster num_fg mean_fg mean_bg feature mait_fraction clone_index nbr_frac graph_type feature_type
0.028964 5.624592 3.025280e-12 2 5 546 0.106227 0.030772 TRAV35 0.0 3903 0.1 gex_nbr tcr
0.028964 5.624592 3.025280e-12 5 5 546 0.106227 0.030772 TRAV35 0.0 422 0.1 gex_nbr tcr
0.053174 5.512982 2.340168e-11 5 5 546 0.104396 0.030976 TRAV35 0.0 2562 0.1 gex_nbr tcr
0.053174 5.512982 2.340168e-11 5 5 546 0.104396 0.030976 TRAV35 0.0 1960 0.1 gex_nbr tcr
0.097223 5.400208 1.714195e-10 2 5 546 0.102564 0.031180 TRAV35 0.0 3873 0.1 gex_nbr tcr
0.097223 5.400208 1.714195e-10 2 5 546 0.102564 0.031180 TRAV35 0.0 2837 0.1 gex_nbr tcr
0.177018 5.286225 1.189110e-09 5 5 546 0.100733 0.031384 TRAV35 0.0 1620 0.1 gex_nbr tcr
0.177018 5.286225 1.189110e-09 5 5 546 0.100733 0.031384 TRAV35 0.0 5031 0.1 gex_nbr tcr
0.177018 5.286225 1.189110e-09 5 5 546 0.100733 0.031384 TRAV35 0.0 3055 0.1 gex_nbr tcr
0.177018 5.286225 1.189110e-09 5 5 546 0.100733 0.031384 TRAV35 0.0 2603 0.1 gex_nbr tcr
0.177018 5.286225 1.189110e-09 2 5 546 0.100733 0.031384 TRAV35 0.0 2418 0.1 gex_nbr tcr
0.177018 5.286225 1.189110e-09 2 5 546 0.100733 0.031384 TRAV35 0.0 1966 0.1 gex_nbr tcr
0.177018 5.286225 1.189110e-09 2 5 546 0.100733 0.031384 TRAV35 0.0 3408 0.1 gex_nbr tcr
0.320903 5.170984 7.811775e-09 2 5 546 0.098901 0.031588 TRAV35 0.0 4123 0.1 gex_nbr tcr
0.320903 5.170984 7.811775e-09 2 5 546 0.098901 0.031588 TRAV35 0.0 2688 0.1 gex_nbr tcr
0.320903 5.170984 7.811775e-09 2 5 546 0.098901 0.031588 TRAV35 0.0 594 0.1 gex_nbr tcr
0.320903 5.170984 7.811775e-09 5 5 546 0.098901 0.031588 TRAV35 0.0 4383 0.1 gex_nbr tcr
0.320903 5.170984 7.811775e-09 5 5 546 0.098901 0.031588 TRAV35 0.0 3381 0.1 gex_nbr tcr
0.320903 5.170984 7.811775e-09 5 5 546 0.098901 0.031588 TRAV35 0.0 5201 0.1 gex_nbr tcr
0.320903 5.170984 7.811775e-09 2 5 546 0.098901 0.031588 TRAV35 0.0 168 0.1 gex_nbr tcr
0.320903 5.170984 7.811775e-09 5 5 546 0.098901 0.031588 TRAV35 0.0 1153 0.1 gex_nbr tcr
0.320903 5.170984 7.811775e-09 5 5 546 0.098901 0.031588 TRAV35 0.0 3134 0.1 gex_nbr tcr
0.320903 5.170984 7.811775e-09 5 5 546 0.098901 0.031588 TRAV35 0.0 1515 0.1 gex_nbr tcr
0.320903 5.170984 7.811775e-09 5 5 546 0.098901 0.031588 TRAV35 0.0 3321 0.1 gex_nbr tcr
0.320903 5.170984 7.811775e-09 2 5 546 0.098901 0.031588 TRAV35 0.0 1129 0.1 gex_nbr tcr
0.320903 5.170984 7.811775e-09 2 5 546 0.098901 0.031588 TRAV35 0.0 505 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 2 5 546 0.097070 0.031791 TRAV35 0.0 2373 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 6 5 546 0.097070 0.031791 TRAV35 0.0 3588 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 5 5 546 0.097070 0.031791 TRAV35 0.0 1660 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 5 5 546 0.097070 0.031791 TRAV35 0.0 4506 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 2 5 546 0.097070 0.031791 TRAV35 0.0 4841 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 2 5 546 0.097070 0.031791 TRAV35 0.0 1735 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 2 5 546 0.097070 0.031791 TRAV35 0.0 3801 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 2 5 546 0.097070 0.031791 TRAV35 0.0 3860 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 5 5 546 0.097070 0.031791 TRAV35 0.0 3622 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 2 5 546 0.097070 0.031791 TRAV35 0.0 3583 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 2 5 546 0.097070 0.031791 TRAV35 0.0 4067 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 2 5 546 0.097070 0.031791 TRAV35 0.0 3879 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 2 5 546 0.097070 0.031791 TRAV35 0.0 3314 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 2 5 546 0.097070 0.031791 TRAV35 0.0 5009 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 5 5 546 0.097070 0.031791 TRAV35 0.0 4629 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 6 5 546 0.097070 0.031791 TRAV35 0.0 319 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 2 5 546 0.097070 0.031791 TRAV35 0.0 2759 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 2 5 546 0.097070 0.031791 TRAV35 0.0 3536 0.1 gex_nbr tcr
0.579116 5.054433 4.860299e-08 2 5 546 0.097070 0.031791 TRAV35 0.0 475 0.1 gex_nbr tcr
0.303934 3.924061 5.467324e-08 9 5 206 0.126214 0.034877 TRAV35 0.0 -1 0.0 gex_cluster tcr
1.040202 4.936519 2.864064e-07 2 5 546 0.095238 0.031995 TRAV35 0.0 4864 0.1 gex_nbr tcr
1.040202 4.936519 2.864064e-07 5 5 546 0.095238 0.031995 TRAV35 0.0 2541 0.1 gex_nbr tcr
1.040202 4.936519 2.864064e-07 2 5 546 0.095238 0.031995 TRAV35 0.0 1398 0.1 gex_nbr tcr
1.040202 4.936519 2.864064e-07 5 5 546 0.095238 0.031995 TRAV35 0.0 3222 0.1 gex_nbr tcr
Omitted 300 lines

gex_graph_vs_tcr_features_plot


This plot summarizes the results of a graph versus features analysis by labeling the clonotypes at the center of each biased neighborhood with the name of the feature biased in that neighborhood. The feature names are drawn in colored boxes whose color is determined by the strength and direction of the feature score bias (from bright red for features that are strongly elevated to bright blue for features that are strongly decreased in the corresponding neighborhoods, relative to the rest of the dataset).

At most one feature (the top scoring) is shown for each clonotype (ie, neighborhood). The UMAP xy coordinates for this plot are stored in adata.obsm['X_gex_2d']. The score used for ranking correlations is 'mwu_pvalue_adj'. The threshold score for displaying a feature is 1.0. The feature column is 'feature'. Since we also run graph-vs-features using "neighbor" graphs that are defined by clusters, ie where each clonotype is connected to all the other clonotypes in the same cluster, some biased features may be associated with a cluster rather than a specific clonotype. Those features are labeled with a '*' at the end and shown near the centroid of the clonotypes belonging to that cluster.
Image source: ./CoNGA.output_gex_graph_vs_tcr_features_plot.png

gex_graph_vs_tcr_features_panels


Graph-versus-feature analysis was used to identify a set of TCR features that showed biased distributions in GEX neighborhoods. This plot shows the distribution of the top-scoring TCR features on the GEX UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: ./CoNGA.output_gex_graph_vs_tcr_features_panels.png

graph_vs_features_gex_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the GEX landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_gex' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are GEX clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=54 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie GEX features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the GEX features).


Image source: ./CoNGA.output_graph_vs_features_gex_clustermap.png

graph_vs_features_tcr_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the TCR landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_tcr' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are TCR clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=54 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie TCR features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the TCR features).


Image source: ./CoNGA.output_graph_vs_features_tcr_clustermap.png

graph_vs_summary


Summary figure for the graph-vs-graph and graph-vs-features analyses.
Image source: ./CoNGA.output_graph_vs_summary.png

gex_clusters_tcrdist_trees


These are TCRdist hierarchical clustering trees for the GEX clusters (cluster assignments stored in adata.obs['clusters_gex']). The trees are colored by CoNGA score with a color score range of 5.45e+01 (blue) to 5.45e-08 (red). For coloring, CoNGA scores are log-transformed, negated, and square-rooted (with an offset in there, too, roughly speaking).
Image source: ./CoNGA.output_gex_clusters_tcrdist_trees.png

conga_threshold_tcrdist_tree


This is a TCRdist hierarchical clustering tree for the clonotypes with CoNGA score less than 10.0. The tree is colored by CoNGA score with a color score range of 1.00e+01 (blue) to 1.00e-08 (red). For coloring, CoNGA scores are log-transformed, negated, and square-rooted (with an offset in there, too, roughly speaking).
Image source: ./CoNGA.output_conga_threshold_tcrdist_tree.png

hotspot_features


Find GEX (TCR) features that show a biased distribution across the TCR (GEX) neighbor graph, using a simplified version of the Hotspot method from the Yosef lab.

DeTomaso, D., & Yosef, N. (2021). "Hotspot identifies informative gene modules across modalities of single-cell genomics." Cell Systems, 12(5), 446–456.e9.

PMID:33951459

Columns:

Z: HotSpot Z statistic

pvalue_adj: Raw P value times the number of tests (crude Bonferroni correction)

nbr_frac: The K NN nbr fraction used for the neighbor graph construction (nbr_frac = 0.1 means K=0.1*num_clonotypes neighbors)


Z pvalue_adj feature feature_type nbr_frac
90.702586 0.000000e+00 EPHB6 gex 0.10
61.890507 0.000000e+00 EPHB6 gex 0.01
26.819071 1.936477e-156 TRAV35 tcr 0.10
20.649554 9.600342e-91 TOX2 gex 0.01
15.758733 5.981996e-54 nndists_tcr tcr 0.10
15.597248 7.405479e-51 PDCD1 gex 0.01
14.516311 9.551402e-46 TRAV35 tcr 0.01
13.760419 4.296737e-39 ITM2A gex 0.01
13.512100 1.292856e-37 gex_cluster9 gex 0.01
12.904767 4.122639e-34 VIM gex 0.01
11.882627 1.457151e-30 TRBV20-1 tcr 0.10
12.240246 1.846071e-30 TOX2 gex 0.10
12.199470 3.048435e-30 TIGIT gex 0.01
11.756256 6.389983e-28 PDCD1 gex 0.10
10.938449 7.354773e-24 TOX gex 0.01
10.335233 4.757278e-21 C9orf16 gex 0.01
10.296214 7.141312e-21 PFN1 gex 0.01
10.211318 1.719339e-20 C9orf16 gex 0.10
10.193011 2.076062e-20 PVALB gex 0.10
10.138774 3.622198e-20 ITM2A gex 0.10
9.904906 3.861975e-19 LIMS2 gex 0.01
9.182800 4.200256e-18 rim tcr 0.10
9.603399 7.537057e-18 PVALB gex 0.01
9.557198 1.178908e-17 TPT1 gex 0.01
8.910472 5.081583e-17 TRAJ42 tcr 0.10
8.902257 5.472241e-17 tcr_cluster5 tcr 0.10
8.849833 8.765063e-17 surface tcr 0.10
9.295651 1.425624e-16 SH2D1A gex 0.01
8.997220 2.255798e-15 COTL1 gex 0.01
8.808661 1.234072e-14 CPM gex 0.01
8.633401 5.802951e-14 AC004585.1 gex 0.01
8.548245 1.217747e-13 GFOD1 gex 0.10
8.458796 2.632356e-13 TIGIT gex 0.10
8.347877 6.772147e-13 TBC1D4 gex 0.01
8.269768 1.307835e-12 GFOD1 gex 0.01
8.132492 4.097882e-12 GAPDH gex 0.01
7.982960 1.392083e-11 AC055839.2 gex 0.01
7.939167 1.983326e-11 CXXC5 gex 0.01
7.924408 2.233678e-11 VIM gex 0.10
7.838715 4.435280e-11 GAPDH gex 0.10
7.828669 4.804402e-11 PEG10 gex 0.01
7.798385 6.109926e-11 ZNF703 gex 0.01
7.777875 7.186476e-11 LYG2 gex 0.01
7.769915 7.652829e-11 PPP1CC gex 0.01
7.662239 1.780417e-10 AC108863.2 gex 0.01
7.657492 1.847447e-10 SCGB3A1 gex 0.01
7.549897 4.244521e-10 IGFBP4 gex 0.10
7.507164 5.887513e-10 IGFBP4 gex 0.01
7.471355 7.734096e-10 AC012645.3 gex 0.01
7.395838 1.369220e-09 CPM gex 0.10
Omitted 115 lines

hotspot_gex_umap


HotSpot analysis (Nir Yosef lab, PMID: 33951459) was used to identify a set of GEX (TCR) features that showed biased distributions in TCR (GEX) space. This plot shows the distribution of the top-scoring HotSpot features on the GEX UMAP 2D landscape. The features are ranked by adjusted P value (raw P value * number of comparisons). The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel.

Features are filtered based on correlation coefficient to reduce redundancy: if a feature has a correlation of >= 0.9 (the max_feature_correlation argument to conga.plotting.plot_hotspot_umap) to a previously plotted feature, that feature is skipped. Points are plotted in order of increasing feature score
Image source: ./CoNGA.output_hotspot_combo_features_0.100_nbrs_gex_plot_umap_nbr_avg.png

hotspot_gex_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the GEX landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_gex' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are GEX clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=54 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie GEX features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the GEX features).


Image source: ./CoNGA.output_hotspot_combo_features_0.100_nbrs_gex_plot_clustermap_nbr_avg.png

hotspot_tcr_umap


HotSpot analysis (Nir Yosef lab, PMID: 33951459) was used to identify a set of GEX (TCR) features that showed biased distributions in TCR (GEX) space. This plot shows the distribution of the top-scoring HotSpot features on the TCR UMAP 2D landscape. The features are ranked by adjusted P value (raw P value * number of comparisons). The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel.

Features are filtered based on correlation coefficient to reduce redundancy: if a feature has a correlation of >= 0.9 (the max_feature_correlation argument to conga.plotting.plot_hotspot_umap) to a previously plotted feature, that feature is skipped. Points are plotted in order of increasing feature score
Image source: ./CoNGA.output_hotspot_combo_features_0.100_nbrs_tcr_plot_umap_nbr_avg.png

hotspot_tcr_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the TCR landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_tcr' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are TCR clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=54 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie TCR features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the TCR features).


Image source: ./CoNGA.output_hotspot_combo_features_0.100_nbrs_tcr_plot_clustermap_nbr_avg.png